Analysis of Sharing Overhead in Shared Memory Multiprocessors

نویسندگان

Pierfrancesco Foglia

Roberto Giorgi

Cosimo Antonio Prete

چکیده

A cache memory contributes in both hiding memory latency and reducing the traffic on the processor interconnection network of shared memory multiprocessors but it causes the coherence problem. A coherence protocol [Tomasevic93] is required in order to guarantee the coherence of the cached copies. An adequate choice of the coherence protocol is critical for performance. In fact, when the number of nodes exceeds a critical value, the processor interconnection network reaches a saturation condition, due to both cache misses and coherence operations. Three main classes of coherence protocols are write-update (WU), write-invalidate (WI), and hybrid. A WU protocol updates the remote copies on each write involving on a shared copy. Whereas, a WI protocol invalidates previously remote copies in order to avoid updating them. A hybrid protocol uses both WU and WI strategies to combine the best aspect of each one. The frequency and the pattern of accesses to shared copies influence coherence overhead. Since access pattern to shared data varies from application to application; neither WI nor WU is the better strategy for maintaining cache coherence in all cases. Results [Veenstra94] show that, although the hybrid protocol [Cox93, Prete90, Prete95b, Stenstrom93] does not offer any significant advantage over the best choice of pure protocols for a particular application, it may offer optimal performance over a wider range of applications than any single pure protocol. An optimal selection for the coherence protocol can be made by considering the traffic induced by the two approaches in case of different sharing. The coherence overhead induced by a WU protocol is due to all the operations needed to update the remote copies. Whereas, a WI protocol invalidates remote copies and processors generate a miss on the access to the invalidated copy. Invalidation and block fetching (due to invalidation misses) contribute to the coherence overhead of WI protocols. By considering the cost for

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Algorithm for the Classification of Coherence Related Overhead in Shared-Bus Shared-Memory Multiprocessors

This paper describes an algorithm for the classification of coherence overhead in shared-bus shared-memory multiprocessors. This algorithm is applicable to Write Update, Write Invalidate and Hybrid protocols and models the effect of finite size (real) cache. It differs from previous classifications because it classifies any sources of coherence overhead, i.e. invalidation miss, invalidate and w...

متن کامل

False Sharing Elimination by Selection of Runtime Scheduling Parameters

False sharing can be a source of signiicant overhead on shared-memory multiprocessors. Several program restructuring techniques to reduce false sharing have been proposed in past work. In this paper, we propose an approach for elimination of false sharing based solely on selection of runtime schedule parameters for parallel loops. This approach leads to more portable code since only the schedul...

متن کامل

ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols

ÐDirectories have been used to maintain cache coherency in shared memory multiprocessors with private caches. The traditional full map directory tracks the exact caching status for each shared memory block and is designed to be efficient and simple. Unfortunately, the inherent directory size explosion makes it unsuitable for large-scale multiprocessors. In this paper, we propose a new directory...

متن کامل

Sharing Speculation : A Mechanism for Low-Latency Access to Falsely Shared Data

False sharing of data is an important phenomenon affecting performance in shared memory multiprocessors. False sharing results in unnecessary coherency overhead by causing invalidation of the shared cache line, and increasing the latency of the load accessing the cache line. As microprocessors incorporate increasingly large caches with large cache lines, false sharing will become more common, a...

متن کامل

An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors

ÐDirectory schemes have long been used to solve the cache coherence problem for large scale shared memory multiprocessors. In addition, tree-based protocols have been employed to reduce the directory size and the invalidation latency for a large degree of data sharing in the system. However, the existing tree-based protocols involve a very high communication overhead for maintaining a balanced ...

متن کامل

Simulation study of memory performance of SMP multiprocessors running a TPC-W workload

The infrastructure to support electronic commerce is one of the areas where more processing power is needed. A multiprocessor system can offer advantages for running electronic commerce applications. The memory performance of an electronic commerce server, i.e. a system running electronic commerce applications, is evaluated in the case of shared-bus multiprocessor architecture. The software arc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Analysis of Sharing Overhead in Shared Memory Multiprocessors

نویسندگان

چکیده

منابع مشابه

An Algorithm for the Classification of Coherence Related Overhead in Shared-Bus Shared-Memory Multiprocessors

False Sharing Elimination by Selection of Runtime Scheduling Parameters

ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols

Sharing Speculation : A Mechanism for Low-Latency Access to Falsely Shared Data

An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors

Simulation study of memory performance of SMP multiprocessors running a TPC-W workload

عنوان ژورنال:

اشتراک گذاری